Achieving non-discrimination in prediction
نویسندگان
چکیده
Discrimination-aware classification is receiving an increasing attention in the data mining and machine learning fields. The data preprocessing methods for constructing a discrimination-free classifier remove discrimination from the training data, and learn the classifier from the cleaned data. However, there lacks of a theoretical guarantee for the performance of these methods. In this paper, we fill this theoretical gap by mathematically bounding the probability that the discrimination in predictions is within a given interval in terms of the given training data and classifier. In our analysis, we adopt the causal model for modeling the mechanisms in data generation, and formally defining discrimination in the population, in a dataset, and in the prediction. The theoretical results show that the fundamental assumption made by the data preprocessing methods is not correct. Finally, we develop a framework for constructing a discrimination-free classifier with a theoretical guarantee.
منابع مشابه
Which of Simplified Acute Physiology Score-III or Mortality Probability Model-III scoring systems in prediction of mortality of non-traumatic patients is superior?
Background & Aims: Different scoring systems are used in order to assess the functional quality of intensive care units (ICU) and to predict the required costs and facilities of intensive cares. Variety of scoring systems has been explained that each has advantages and disadvantages. In this study Simplified Acute Physiology Score-III (SAPS-III) and Mortality Probability Model-III (MPM-III) wer...
متن کاملPrediction of Election Results Using Discrimination of Non-Respondents: The Case of the 1997 Korea Presidential Election
This study is aimed to propose a prediction method of election results using discrimination of non-respondents. CHAID algorithm is used for the purpose. We use the 1997 Korea presidential election forecasting survey data for the example.
متن کاملApplication of Near Infrared Reflectance Spectroscopy for Rapid and Non-Destructive Discrimination of Hulled Barley, Naked Barley, and Wheat Contaminated with Fusarium
Fusarium is a common fungal disease in grains that reduces the yield of barley and wheat. In this study, a near infrared reflectance spectroscopic technique was used with a statistical prediction model to rapidly and non-destructively discriminate grain samples contaminated with Fusarium. Reflectance spectra were acquired from hulled barley, naked barley, and wheat samples contaminated with Fus...
متن کاملOutcome prediction of different groups of patients using a modified scoring system
Abstract Background: In this study we aimed to examine the discrimination and calibration of a severity characterization of trauma (ASCOT) in our setting to determine whether its usage is appropriate to predict outcome of our trauma patients. Methods: This study was conducted in three hospitals. All patients admitted in studied hospitals were divided randomly into two equal subgroup...
متن کاملComparison of Bayesian and Frequentist Methods in Estimating the Net Reclassification and Integrated Discrimination Improvement Indices for Evaluation of Prediction Models: Tehran Lipid and Glucose Study
Introduction: The Frequency-based method is commonly used to estimate the Net Reclassification Improvement (NRI)- and Integrated Discrimination Improvement (IDI) indices. These indices measure the magnitude of the performance of statistical models when a new biomarker is added. This method has poor performance in some cases, especially in small samples. In this study, the performance of two Bay...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1703.00060 شماره
صفحات -
تاریخ انتشار 2017